L10: Vectors in GIS

Bogdan G. Popescu

John Cabot University

Spatial Data Classes in R

  • Spatial entries in GIS can be represented as a vector or as a raster

  • You see below how river can be represented as a vector (left) and as a raster (right)

Vector Layers

Vector layers are sets of geometries associated with non-spatial attributes

The geomteries are sequences of one or more coordinates, connected to form lines or polygons

The non-spatial attributes come as a table

Vector layers can be:

  • points
  • lines
  • polygons

Vector Layers

Points

Points are just dots that have a coordinate pair:

(i.e. latitude - X and longitudes - Y)

Points

Here are for example all the restaurants in central Rome

Polylines

Polylines are a sequence of two or more coordinate pairs - vertices.

Roads and rivers are typically stored as polylines in GIS

Polylines

Here are for example all the roads in central Rome

Polygons

Polygons are three or more line segments whose starting and ending coordinate pairs are the same.

Countries are usually depicted as polygons

Polygons

Here are for example all the buildings in central Rome

Polygons

Here are for example all countries in the world

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

You can download shapefiles for different countries by going to https://gadm.org

Shapefiles for World

Shapefiles for World

Shapefiles for Subdivsions of the US

Shapefiles for Subdivsions of the US

Shapefiles for Subdivsions of the US

Shapefiles for Subdivsions of the US

Shapefiles for World

We can do the same for Italy

Shapefiles for World

Shapefiles for World

Shapefiles for World

Shapefiles for World

Shapefiles for World

Vector File Formats

The most common vector files are:

Type Format File extension
Binary ESRI Shapefile .shp, .shx, .dbf, .prj
GeoPackage (GPKG) .gpkg
Plain Text GeoJSON .json or .geojson
GPS Exchange Format (GPX) .gpx
Keyhole Markup Language (KML) .gpx
Spatial Databases PostgreSQL / PostGIS

The sp package

  • The first R package to establish a uniform vector layer class system was sp (2005-2023).

  • Together with rgdal (2003-2023) and rgeos (2011-2023), the sp package dominated the landscape of spatial analysis in R

Spatial Data Structures in sp

Class Geometry type Attributes
SpatialPoints Points -
SpatialPointsDataFrame Points data.frame
SpatialLines Lines -
SpatialLinesDataFrame Lines data.frame
SpatialPolygons Polygons -
SpatialPolygonsDataFrame Polygons data.frame

Caveat

  • We will avoid the use of sp, rgdal, and rgeos and focus on the sf package.

  • The rgdal package is deprecated starting with 2023.

  • You should avoid using those packages (i.e. you will get errors and maybe conflicts with other libraries).

  • However, it is important to be aware of these packages, given how many forums, articles, and dependencies have been using them.

The sf package

sf is the standard package for working with vector data in R.

It relies on external software components: GDAL, GEOS, and PROJ.

It contains three classes:

  • sfg — a single geometry
  • sfc - a geometry column: a set of sfg geometries with CRS information
  • sf - a layer which is an sfc column inside a data frame with non-spatial attributes.

The sf class

Here is a polygon layer with three features and six non-spatial attributes: BIR74, SID74, NWBIR74, BIR79, SID79, NWBIR79

The sf class

Mapping the layer

The sf class

Geometry

sfg is a single geometry which can be of multiple types

  • st_point
  • st_multipoint
  • st_linestring
  • st_multilinestring
  • st_polygon
  • st_multipolygon
  • st_geometrycollection

Geometry

These are the seven most common Simple Feature geometry types

Geometry

These are the seven most common Simple Feature geometry types

Geometry

These are the seven most common Simple Feature geometry types

Geometry

These are the seven most common Simple Feature geometry types

Geometry

These are the seven most common Simple Feature geometry types

Geometry

These are the seven most common Simple Feature geometry types

Geometry

These are the seven most common Simple Feature geometry types

Geometry

The last type GEOMETRYCOLLECTION is rarely used

The Shapefile format does not support geometries of type GEOMETRYCOLLECTION

Geometry - Points

Let us create one point:

library(sf)
library(ggplot2)
pt1 = st_point(c(34.798443, 31.243288))

If we print the point, we get its WKT (Well-Known Text) representation

print(pt1)
POINT (34.79844 31.24329)

If we examine the class definition, we get:

class(pt1)
[1] "XY"    "POINT" "sfg"  

Geometry - Points

class(pt1)
[1] "XY"    "POINT" "sfg"  
  • XY is the two-dimensional geometry of the point
  • POINT is the geometry type (it could be POINT, MULTIPOINT, etc.)
  • sfg is the general class (i.e. simple feature geometry)

Geometry - Points

This is what the point looks like:

plot(pt1)

Geometry - Polygon

This is how we create a polygon

a <- st_polygon(list(cbind(c(0,0,7.5,7.5,0),c(0,-1,-1,0,0))))

If we print the polygon, we get its WKT representation

print(a)
POLYGON ((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0))

If we examine the class definition, we get:

class(a)
[1] "XY"      "POLYGON" "sfg"    

Geometry - Polygon

This is what that polygon looks like:

plot(a)

Geometry - Polygon 2

We can also make a more complicated polygon:

b = st_polygon(list(cbind(c(0,1,2,3,4,5,6,7,7,0),c(1,0,0.5,0,0,0.5,-0.5,-0.5,1,1))))
print(b)
POLYGON ((0 1, 1 0, 2 0.5, 3 0, 4 0, 5 0.5, 6 -0.5, 7 -0.5, 7 1, 0 1))

This is what the class looks like:

class(b)
[1] "XY"      "POLYGON" "sfg"    

Geometry - Polygon 2

This is what that polygon looks like:

plot(b)

Combining Geometries

We can combine geomteries using the c function

This will result into a single MULTIPOLYGON geometry, composed of all the shapes in the input

ab <- c(a, b)
print(ab)
MULTIPOLYGON (((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0)), ((0 1, 1 0, 2 0.5, 3 0, 4 0, 5 0.5, 6 -0.5, 7 -0.5, 7 1, 0 1)))

If we check the class of ab, we see that it is a MULTIPOLYGON

class(ab)
[1] "XY"           "MULTIPOLYGON" "sfg"         

Combining Geometries

This is what the combined polygon looks like:

plot(ab)

Intersecting Geometries

A new geometry can be calculated by applying various functions to existing ones.

i <- st_intersection(a, b)
print(i)
GEOMETRYCOLLECTION (POLYGON ((7 0, 7 -0.5, 6 -0.5, 5.5 0, 7 0)), LINESTRING (4 0, 3 0), POINT (1 0))

Notice the difference from the combined geometries: ab

ab <- c(a, b)
print(ab)
MULTIPOLYGON (((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0)), ((0 1, 1 0, 2 0.5, 3 0, 4 0, 5 0.5, 6 -0.5, 7 -0.5, 7 1, 0 1)))

Also, notice the difference between classes:

class(i)
[1] "XY"                 "GEOMETRYCOLLECTION" "sfg"               
class(ab)
[1] "XY"           "MULTIPOLYGON" "sfg"         

Intersecting Geometries

This is what the intersected geometries look like:

i <- st_intersection(a, b)
plot(i)

Intersecting Geometries

This is how they differ from the combined geometries:

ab <- c(a, b)
plot(ab)

Geometry column - sfc

Let us create two more points: pt2 and pt3.

pt2 = st_point(c(34.812831, 31.260284))
pt3 = st_point(c(35.011635, 31.068616))

Geometry objects can be collected into a geometry column - sfc, which is done with st_sfc

A geometry column also contains a Coordinate Reference System

Geometry column - sfc

Four types of CRS definitions are acceptable

  • An EPSG code (e.g. 4326)
  • A crs object, as returned by the st_crs function
  • A PROJ4 string (e.g. '+proj=longlat +datum=WGS84 +no_defs')
  • A WKT string

If the crs argument is omitted, the default is to set an “undefined” CRS (i.e., NA).

EPSG Codes

It is ideal to use EPSG codes.

EPSG Geodetic Parameter Dataset is a public registry of geodetic datums, spatial reference systems, Earth ellipsoids, coordinate transformations and related units of measurement.

It came from a member of the European Petroleum Survey Group in 1985.

Different countries have different codes.

EPSG Codes

For example, the standard for the whole world is 4326.

The code for Italy is 6875.

This is from https://epsg.io

EPSG Geodetic Parameter Dataset is a public registry of geodetic datums, spatial reference systems.

EPSG Codes

EPSG Codes

EPSG Codes

EPSG Codes

Geometry column - sfc

We can combine pt1, pt2, and pt3.

Reminder: this is how we defined them:

pt1 = st_point(c(34.798443, 31.243288))
pt2 = st_point(c(34.812831, 31.260284))
pt3 = st_point(c(35.011635, 31.068616))

This is how we combine them:

combined <- st_sfc(pt1, pt2, pt3, crs = 4326)
combined
Geometry set for 3 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 34.79844 ymin: 31.06862 xmax: 35.01164 ymax: 31.26028
Geodetic CRS:  WGS 84
POINT (34.79844 31.24329)
POINT (34.81283 31.26028)
POINT (35.01163 31.06862)

Note the we added 4326 - the code for the whole world.

Geometry column - sfc

The combined points look like below

plot(combined)

Layer sf

We can add attributes to the geometry in the following way:

name<-c("Point 1", "Point 2", "Point 3")
name_df<-data.frame(name)
name_df
     name
1 Point 1
2 Point 2
3 Point 3

Layer sf

We can combine the attribute table name_df and the geometry column geom (sfc) using st_sf, resulting in a layer

layer <- st_sf(name_df, combined)
layer
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 34.79844 ymin: 31.06862 xmax: 35.01164 ymax: 31.26028
Geodetic CRS:  WGS 84
     name                  combined
1 Point 1 POINT (34.79844 31.24329)
2 Point 2 POINT (34.81283 31.26028)
3 Point 3 POINT (35.01163 31.06862)

Mapping the sf layer with plot

plot(layer)

Mapping the sf layer with ggplot

ggplot()+
  geom_sf(data=layer)+
  theme_bw()

Mapping the sf layer with ggplot

As you might imagine, we can also plot the polygons that we made previously with ggplot

Here is what ab looks like:

ggplot()+
  geom_sf(data=ab)+
  theme_bw()

Extracting Layer Components

Sometimes, we are interested in going the other way around and extract:

  • the geometry
  • the attribute table
  • the coordinates

Extracting Layer Components: the Geometry

The following command allows us to extract the geometry

st_geometry(layer)
Geometry set for 3 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 34.79844 ymin: 31.06862 xmax: 35.01164 ymax: 31.26028
Geodetic CRS:  WGS 84
POINT (34.79844 31.24329)
POINT (34.81283 31.26028)
POINT (35.01163 31.06862)

Extracting Layer Components: the Attribute Table

The following command allows us to drop the geometry

We are then left just with the attribute table:

st_drop_geometry(layer)
     name
1 Point 1
2 Point 2
3 Point 3

Extracting Layer Components: the Coordinates

The coordinates of sf, sfc or sfg objects can be obtained with the function st_coordinates.

The coordinates are a matrix:

st_coordinates(layer)
            X        Y
[1,] 34.79844 31.24329
[2,] 34.81283 31.26028
[3,] 35.01163 31.06862

We can turn the matrix into a dataframe in the following way

data.frame(st_coordinates(layer))
         X        Y
1 34.79844 31.24329
2 34.81283 31.26028
3 35.01163 31.06862

Creating a Point Layer from a table

We can create point layers from tables.

Here is an example below:

Creating a Point Layer from a table

We can create point layers from tables.

Here is an example below:

df <- read.table("./data/restaurants_rome.csv", header = TRUE, sep = ",")
head(df, n=3)
                           name                 addr.street      lat      lon
1             Pizzeria ai Marmi         Viale di Trastevere 41.88826 12.47379
2                 Sichuan Haozi Via di San Martino ai Monti 41.89580 12.49948
3 Dar filettaro a Santa Barbara           Largo dei Librari 41.89467 12.47370

Creating a Point Layer from a table

We can create point layers from tables.

Here is an example below:

df <- read.table("./data/restaurants_rome.csv", header = TRUE, sep = ",")
head(df, n=3)
                           name                 addr.street      lat      lon
1             Pizzeria ai Marmi         Viale di Trastevere 41.88826 12.47379
2                 Sichuan Haozi Via di San Martino ai Monti 41.89580 12.49948
3 Dar filettaro a Santa Barbara           Largo dei Librari 41.89467 12.47370

The restaurants dataframe can be converted to a sf layer using st_as_sf:

df2 = st_as_sf(df, coords = c('lon', 'lat'))

Creating a Point Layer from a table

We can create point layers from tables.

Here is an example below:

df <- read.table("./data/restaurants_rome.csv", header = TRUE, sep = ",")
head(df, n=3)
                           name                 addr.street      lat      lon
1             Pizzeria ai Marmi         Viale di Trastevere 41.88826 12.47379
2                 Sichuan Haozi Via di San Martino ai Monti 41.89580 12.49948
3 Dar filettaro a Santa Barbara           Largo dei Librari 41.89467 12.47370

The restaurants dataframe can be converted to a sf layer using st_as_sf:

df2 = st_as_sf(df, coords = c('lon', 'lat'))
st_crs(df2)<-st_crs(4326)

Creating a Point Layer from a table

We can create point layers from tables.

Here is an example below:

df <- read.table("./data/restaurants_rome.csv", header = TRUE, sep = ",")
head(df, n=3)
                           name                 addr.street      lat      lon
1             Pizzeria ai Marmi         Viale di Trastevere 41.88826 12.47379
2                 Sichuan Haozi Via di San Martino ai Monti 41.89580 12.49948
3 Dar filettaro a Santa Barbara           Largo dei Librari 41.89467 12.47370

The restaurants dataframe can be converted to a sf layer using st_as_sf:

df2 = st_as_sf(df, coords = c('lon', 'lat'))
st_crs(df2)<-st_crs(4326)
head(df2, n=3)
Simple feature collection with 3 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 12.4737 ymin: 41.88826 xmax: 12.49948 ymax: 41.8958
Geodetic CRS:  WGS 84
                           name                 addr.street
1             Pizzeria ai Marmi         Viale di Trastevere
2                 Sichuan Haozi Via di San Martino ai Monti
3 Dar filettaro a Santa Barbara           Largo dei Librari
                   geometry
1 POINT (12.47379 41.88826)
2  POINT (12.49948 41.8958)
3  POINT (12.4737 41.89467)

Creating a Point Layer from a table

This is what these points look like, if we map them out:

ggplot()+
  geom_sf(data=df2, size=0.3)

Creating a Point Layer from a table

Without some context, it is difficult to see what those points

You can see below an updated map with the streets in the background.

sf properties: rows and columns

Some important properties of sf objects include rows and columns.

Number of rows:

nrow(df2)
[1] 2811

Number of columns:

ncol(df2)
[1] 3

We can obtain both rows and column by typing:

dim(df2)
[1] 2811    3

sf properties: bounding box coordinates

The st_bbox function returns the bounding box coordinates

st_bbox(df2)
    xmin     ymin     xmax     ymax 
12.21167 41.70574 12.77428 42.06974 

sf properties: crs

The st_crs function returns the Coordinate Reference System (CRS)

st_crs(df2)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Subsetting based on attributes

Selecting only the restaurants that have an address:

rest_noaddress<-subset(df2, !is.na(addr.street))

Subsetting based on attributes

Selecting only the restaurants that have an address:

rest_noaddress<-subset(df2, !is.na(addr.street))
library(ggpubr)
f1<-ggplot()+
  geom_sf(data=df2, size=0.3)

Subsetting based on attributes

Selecting only the restaurants that have an address:

rest_noaddress<-subset(df2, !is.na(addr.street))
library(ggpubr)
f1<-ggplot()+
  geom_sf(data=df2, size=0.3)

f2<-ggplot()+
  geom_sf(data=rest_noaddress, size=0.3)

Subsetting based on attributes

Here is how the two dataframes compare:

ggarrange(f1, f2)

Subsetting based on attributes

Subsetting columns is very similar to subsetting a dataframe.

The difference is the geometry column that stays.

We turn the spatial dataframe into a regular dataframe.

df3<-st_drop_geometry(df2)

Subsetting based on attributes

This is how the two compare:

head(df2, n=3)
Simple feature collection with 3 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 12.4737 ymin: 41.88826 xmax: 12.49948 ymax: 41.8958
Geodetic CRS:  WGS 84
                           name                 addr.street
1             Pizzeria ai Marmi         Viale di Trastevere
2                 Sichuan Haozi Via di San Martino ai Monti
3 Dar filettaro a Santa Barbara           Largo dei Librari
                   geometry
1 POINT (12.47379 41.88826)
2  POINT (12.49948 41.8958)
3  POINT (12.4737 41.89467)
head(df3, n=3)
                           name                 addr.street
1             Pizzeria ai Marmi         Viale di Trastevere
2                 Sichuan Haozi Via di San Martino ai Monti
3 Dar filettaro a Santa Barbara           Largo dei Librari

Reading vector layers

As discussed, the most common vector files are:

Type Format File extension
Binary ESRI Shapefile .shp, .shx, .dbf, .prj
GeoPackage (GPKG) .gpkg
Plain Text GeoJSON .json or .geojson
GPS Exchange Format (GPX) .gpx
Keyhole Markup Language (KML) .gpx
Spatial Databases PostgreSQL / PostGIS

Reading vector layers

Here is how we would would read all these different files:

Reading a shape file

usa_cntry <- st_read(dsn="./data/usa_database/shape_files/gadm41_USA_0.shp", quiet = TRUE)

Reading vector layers

Reading a geojson file

usa_cntry = st_read('data/usa_database/geojson/gadm41_USA_0.geojson', quiet = TRUE)

Reading vector layers

Reading a from a geodatabase

usa_cntry <- st_read(dsn="./data/usa_database/usa_geodatabase.gdb",layer="gadm41_USA_0", quiet = TRUE)

Mapping the files

Conclusion

  • Today we learned about vector layers: points, lines, polygons,
  • We also learned about sources for country polygons: https://gadm.org
  • We also learned about common vector files
  • We learned about combining and intersecting geometries
  • We also learned about EPSG codes
  • We also learned about extracting layer components: geometry, attribute table, and coordinates.